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ABSTRACT 

This  study  compares  two  methods  of  statistical  sampling  for  application  in  a 
contracting  context.  The  methods  are  compared  with  the  intent  of  demonstrating  the 
superiority  of  one  method  over  the  other  in  assisting  price  analysts  and  contract 
negotiators  in  expediting  processing  o[  proposals  for  change  orders  while  maintaining 
acceptable  levels  of  risk.  The  Basket  Method  and  Stratified  Random  Sampling 
techniques  are  examined  to  determine  which  method  allows  a  more  accurate  estimate 
of  a  proposal  population  to  be  made.  The  several  populations  used  in  the  simulation 
have  errors  planted  to  represent  both  random  "honest"  mistakes  and  weighted 
"dishonest"  mistakes.  The  author  concludes  that  the  Basket  Method  has  a  more 
desirable  accuracy  pattern  than  the  Stratified  Random  Sampling  Technique. 


TABLE  OF  CONTENTS 


I.  INTRODUCTION 7 

A.  PURPOSE  OF  THIS  STUDY 7 

B.  NEED  FOR  THIS  STUDY 7 

C.  METHODOLOGY 9 

II.  THE  BASKET  METHOD 11 

A.  HISTORY 11 

B.  DESCRIPTION"   11 

1 .  Basket  Assignment 11 

2.  Estimating  Negotiated  Prices  for  Unsampled  Proposals  12 

III.  STRATIFIED  RANDOM  SAMPLING    13 

A.  THE  CASE  FOR  STRATIFICATION 13 

B.  DESCRIPTION  OF  STRATIFIED  RANDOM 

SAMPLING   14 

1.  Establish  the  Desired  Precision  and  Reliability   14 

2.  Designate  the  Strata  and  Strata  Boundaries 15 

3.  Sample  Size  Determination  and  Allocation   17 

4.  Select  a  Random  Sample  of  Size  n    From  the  Strata  19 

5.  Calculate  the  Mean  of  Each  Stratum  Based  On   n.   for 

each  Stratum   19 

6.  Calculate  the  Estimated  Audited  Population  Total 19 

7.  Check.  Reliability  of  the  Estimated  Audited  Population 

Total 19 

IV.  DESCRIPTION  OF  SIMULATION 20 

A.  DERIVING  COMPARABLE  RESULTS 20 

B.  CREATION  OF  THE  TEST  POPULATIONS    20 

C.  SIMULATION  EXECUTION    21 

1.     Stratify  the  Population    22 


2.  Allocate  the  Total  Sample  to  the  Strata 22 

3.  Select  a  Random  Sample  From  Each  Strata   22 

4.  Calculate  the  "Book  Value"  sum  for  Audited  Items 22 

5.  Calculate  the  "Audit  Value"  sum  for  Audited  Items  22 

6.  Calculate  the  Correction  Factor 22 

7.  Calculate  the  Predicted  Population  Audit  Total    22 

8.  Calculate  the  Percent  Error 22 

V.  SUMMARY  AND  CONCLUSION'S 26 

A.  RESISTANCE  TO  PROPOSAL  RIGGING   26 

B.  EVALUATION 26 

C.  AREAS  FOR  FURTHER  RESEARCH 27 

APPENDIX  A:       BASKET  METHOD  PROGRAM  LISTING    2S 

APPENDIX  B:       MINTTAB  SIMULATION  OF  STRATIFIED 

RANDOM  SAMPLING 31 

APPENDIX  C:       DETAILED  RESULTS 35 

LIST  OF  REFERENCES 39 

INITIAL  DISTRIBUTION  LIST 40 


LIST  OF  TABLES 


1.  POPULATION  DESCRIPTION   21 

2.  SIMULATION  RESULTS  (MEAN  %  ERROR)    23 

3.  BASKET  METHOD  ERROR  AS  A  %  OF  STRATIFIED  RANDOM 
SAMPLING  ERROR  24 

4.  STANDARD  DEVIATION  OF  MEAN  PERCENT  ERROR    25 


I.  INTRODUCTION 

A.  PURPOSE  OF  THIS  STUDY 

The  purpose  of  this  study  is  to  examine  two  methods  of  statistical  sampling 
which  may  have  application  in  assisting  government  price  analysts  and  contract 
negotiators  in  expediting  processing  of  proposals  for  change  orders.  The  study  will 
describe  how  and  why  the  use  of  statistical  sampling  may  expedite  proposal  negotiation 
while  maintaining  acceptable  levels  of  risk,  and  will  determine  which  of  the  two 
sampling  methods  examined  provides  more  acceptable  results  under  the  given 
conditions. 

B.  NEED  FOR  THIS  STUDY 

Current  defense  acquisition  procedures  often  involve  situations  in  which 
Department  of  Defense  (DOD)  agencies  must  deal  with  a  sole  source  supplier  in 
buying  material  cr  services.  In  major  weapon  system  acquisitions  for  example,  the 
Department  of  Defense  typically  issues  a  large  number  of  change  orders  to  modify  an 
existing  contract.  A  lead  ship  or  aircraft  production  contract  may  generate  over  10.000 
change  orders.  Why  must  such  a  large  volume  of  change  orders  be  issued?  After  a 
prime  contract  is  awarded  and  production  begins,  design  changes  are  often  necessitated 
by  a  change  in  performance  requirements  requested  by  the  government  or  by  unforseen 
technical  problems  which  almost  always  seem  to  crop  up.  Each  design  change  requires 
a  modification  to  the  prime  contract  called  a  change  order.  In  each  case,  the 
contractor  prepares  a  proposal  reflecting  his  estimate  of  what  the  requested  change  will 
cost.  The  two  parties  (government  and  contractor)  must  then  negotiate  a  price  for 
each  change;  and,  because  the  prime  contractor  is  the  logical  one  to  incorporate  the 
requested  change,  there  is  no  competition  to  help  assure  that  the  government  receives 
the  fairest  possible  price.  The  oniy  mechanisms  working  to  assure  a  fair  price  for  the 
change  order  are  the  adequacy  of  the  contractor's  estimating  procedures,  the 
contractor's  inherent  honesty  and  desire  to  provide  a  good  product  at  a  fair  price,  and 
the  analysis  of  the  proposal  by  government  price  analysts. 

Federal  Acquisition  Regulations  (FAR)  require  the  government  to  analyze  each 
proposal  prior  to  negotiation  to  assure  that  the  proposal  represents  a  fair  price.  The 
analysis   and   negotiation   of  costs   for   each   proposal   is   done   by    some   cognizant 


government  agency.  Often,  a  group  of  government  employees  have  been  assigned  to 
perform  such  functions  in  residence  at  the  contractor's  plant.  The  volume  of  work 
thus  generated  and  the  amount  of  money  involved  are  quite  substantial.  This  volume 
of  work  combined  with  a  lack  of  sufficient  numbers  of  government  analysts  leads  to 
large  backlogs  of  unprocessed  proposals.  To  perform  a  really  thorough  analysis  and 
patient  negotiation  takes  much  more  time  than  government  analysts  are  currently  able 
to  give  to  a  proposal.  If  analysts  do  try  to  take  more  time  and  be  more  thorough,  they 
fall  still  farther  behind  as  the  backlog  continues  to  grow.  Therefore,  there  is 
tremendous  pressure  on  analysts  to  expedite  their  work  even  though  it  is  generally 
recognized  that  hurried  analysis  and  negotiation  can  result  in  costly  overpayment  since 
quickness  commonly  works  against  thoroughness  and  accuracy  [Ref.  1:  pg.  2]. 

Unprocessed  proposals  can  result  in  extra  expense  for  the  contractor  too.  In 
many  situations  involving  ongoing  production  or  repair  work,  the  proposed  work  is 
begun  before  the  proposal  is  analyzed  and  negotiated  to  avoid  expensive  delay  and 
disruption  costs.  The  contractor,  however,  except  for  partial  advances  called  "progress 
payments",  is  not  paid  until  after  the  proposal  is  processed.  Because  of  this,  the 
contractor  may  have  to  borrow  working  capital  to  cover  funds  tied  up  in  the  backlog 
thus  suffering  capital  costs. 

It  is  generally  recognized  that  it  is  in  the  best  interest  of  both  parties  to  expedite 
the  processing  of  the  proposals  without  sacrificing  accuracy.  If  allowed  by  the 
regulations,  analyzing  and  negotiating  a  sample  of  proposals  selected  with  an  effective 
statistically-based  sampling  technique  could  have  these  effects.  If  a  suitable  sample  of 
proposals  were  selected  from  the  backlog  and  carefully  analyzed  and  negotiated,  the 
resulting  data  could  be  extrapolated  to  estimate  what  the  results  would  have  been  had 
every  proposal  received  the  same  treatment. 

The  reason  for  auditing  the  proposal  population  is  to  ensure  the  proposals  reflect 
costs  that  are  fair  and  reasonable.  During  an  audit  the  government  analyst  will  find 
that  one  of  three  possible  conditions  exists.  First,  the  government  may  feel  that  the 
proposal  has  been  understated,  i.e.,  the  contractor's  proposed  cost  for  a  change  to  the 
contract  is  less  than  the  actual  cost  the  contractor  will  incur.  Second,  the  government 
may  conclude  that  the  contractor's  proposal  is  overstated.  Third,  the  audit  findings 
may  conclude  that  the  proposal  is  reasonable.  Any  overstatement  or  understatement  is 
considered  to  be  an  error  in  the  proposal  population.  A  sampling  technique  which 
allowed  no  sampling  error  (with  sampling  error  being  defined  as  the  chance  that  a 


sample  which  is  statistically  selected  and  evaluated  will  lead  to  the  wrong  conclusion  or 
to  an  inaccurate  projection)  would  select  a  sample  from  the  population  of  proposals 
which,  when  audited,  would  always  give  an  estimated  value  for  the  population  as  a 
whole  which  was  exactly  correct,  no  matter  what  the  degree  or  distribution  of  errors  in 
the  proposal  population.  Since  sampling  errors  are  due  entirely  to  chance  and  are 
inherent  in  any  sampling  process,  we  cannot  expect  a  sample  of  "n"  proposals  from  the 
proposal  population  to  provide  an  error-free  characterization  of  the  "\"  proposals  in 
the  population. 

Assuming,  then,  that  there  will  be  some  degree  of  error  in  the  prediction  of  the 
true  value  of  the  entire  proposal  population  whenever  a  sampling  technique  is  used;  the 
behavior  of  the  degree  of  error  must  be  predictable  and  exhibit  certain  qualities  in 
order  for  the  sampling  technique  to  be  considered  appropriate  for  the  purpose 
described  above. 

Specifically,  the  degree  of  error  should  not  be  easily  altered  by  the  distribution, 
size,  or  type  (overstatement, understatement)  of  errors  found  in  the  population.  If 
certain  patterns  of  errors  caused  the  entire  population  to  be  evaluated  as  understated 
then  the  government  would  pay  more  than  a  fair  price  for  the  changes  described  by  the 
population  of  proposals  which  contained  those  errors  [Ref.  2:  pg.  7].  Since  both  parties 
to  the  negotiation  would  have  to  enter  into  a  binding  agreement  to  abide  by  the  results 
of  using  statistical  methods,  all  aspects  of  the  sampling  and  estimation  process  must  be 
disclosed  in  advance.  With  this  necessary  advanced  knowledge,  a  shrewd  contractor 
could  carefully  seed  his  proposal  population  with  deliberate  errors  of  the  appropriate 
size,  type,  and  distribution  and  thereby  be  awarded  a  larger  payment  from  the 
government. 

Therefore,  the  desired  sampling  technique  will  not  necessarily  be  the  one  which 
results  in  the  most  accurate,  average  estimate  for  various  error  arrangements  in  the 
proposal  population.  It  will  instead  be  the  method  which  responds  least  to  variations 
in  the  arrangement  of  errors. 

C.       METHODOLOGY 

As  mentioned  previously,  the  purpose  of  this  study  is  to  examine  the  effectiveness 
of  two  sampling  techniques  that  appear  to  be  most  suitable  for  the  purpose  of 
analyzing  contract  change  orders.  The  two  sampling  techniques  to  be  studied  are 
Stratified  Random  Sampling  and  the  Basket  Method.  In  this  study,  the  two  methods 
will  be  used  to  draw  samples  from  populations  for  evaluation.    The  populations  were 


previously  used  in  a  joint  study  of  the  American  Institute  of  Certified  Public 
Accountants  and  the  American  Statistical  Association  [Ref.  3].  The  data  consist  of  two 
columns  of  values  which  represent  the  proposed,  or  book  value  of  a  contract  change 
and  the  audited  or  true  value  of  the  change.  The  populations  are  rigged  with  either 
random  or  planned  errors.  The  samples  drawn  by  the  two  methods  from  each 
population  will  be  evaluated  and  compared  to  determine  which  method  gives  a  better 
estimate  of  the  whole  population  according  to  the  goals  described  above.  Both  the 
error  rigging  and  evaluation  steps  are  explained  further  in  the  description  of  the 
simulation. 

The  amount  of  work  associated  with  auditing  is  more  closely  correlated  to  the 
number  of  items  being  audited  than  to  the  total  dollar  value  of  all  the  items  being 
audited.  Therefore,  the  sampling  rules  of  the  two  methods  will  be  adjusted  so  that  they 
will  draw  samples  with  the  same  number  of  proposals  from  each  population.  The 
results  of  this  study  will  then  indicate  which  method  yields  the  more  desirable 
prediction  while  holding  the  cost  of  the  audit  constant. 
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II.  THE  BASKET  METHOD 

A.  HISTORY 

The  "Basket  Method"  of  sample  selection  was  developed  by  Dr.  K.  T.  Wallenius, 
Professor  of  Mathematical  Sciences  at  Clemson  University.  Development  of  the 
Basket  Sampling  method  was  sponsored  by  the  Office  of  Naval  Research  and  Naval 
Material  Command  and  funded  by  the  Office  of  Naval  Research  under  its  Acquisition 
Research  program.  The  Basket  Method  was  developed  as  a  potential  tool  to  assist 
price  analysts  and  contract  negotiators  in  expediting  processing  of  proposals  for  change 
orders  when  dealing  with  a  sole  source  supplier. 

B.  DESCRIPTION 

The  name  "Basket  Method"  is  derived  from  the  manner  in  which  the  population 
is  partitioned  into  separate  groups  (baskets)  prior  to  randomly  selecting  one  of  the 
baskets  as  the  sample.  The  goal  of  partitioning  the  population  into  baskets  by  the 
basket  assignment  process  is  to  make  each  basket  a  good  representation  of  the 
population  as  a  whole.  It  must  be  stressed  at  this  point  that  "representative"  should  be 
thought  of  in  terms  of  bid  prices  only.1  Because  each  basket  is  representative  of  the 
population  as  a  whole,  the  spread  and  proportion  of  proposal  values  will  be  nearly 
identical  to  those  of  the  population.  Therefore,  it  makes  no  difference  which  basket  is 
selected  to  be  audited  in  detail.  The  following  example  will  describe  the  use  of  the 
basket  method  technique. 

i.  Basket  Assignment 

Imagine  having  a  population  of  100  proposals  (N=  100)  from  which  a  10% 
sample  (n=10)  is  to  be  selected.  The  proposals  are  then  arranged  in  order  of 
decreasing  bid  price  and  numbered  accordingly;  that  is,  the  proposal  with  the  largest 
bid  price  is  number  1,  the  second  largest  number  2.  and  so  on.  The  proposals  are  now 
ready  to  be  separated  into  10  different  baskets.  Starting  with  proposals  1  through  10 
(those  with  the  largest  bid  prices),   one  proposal  is  placed  in  each  basket.     Each 


lh  is  realized  there  may  be  other  relevant  factors  besides  bid  price  that  should  be 
considered  in  the  definition  of  "representative".  For  the  purposes  of  this  paper, 
however,  it  will  suffice  to  say  that  sophisticated  software  can  quickly  balance  baskets 
for  type  of  work,  degree  of  labor  intensity,  level  of  technology,  etc.  In  short,  whatever 
characteristics  are  identified  as  potentially  important  to  the  value  of  an  audit  will  be 
"balanced"  by  the  basket  method  where  possible. 
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successive  group  of  10  proposals  are  assigned,  one-per-basket.  using  the  following  rule: 
the  largest  unassigned  proposal  is  placed  in  the  basket  with  the  smallest  sum  of  bid  prices. 
For  the  second  group  of  10  proposals,  this  rule  results  in  pairing  proposal  11  with  10, 
12  with  9.  .  .  .,  and  20  with  1.  Basket  subtotals  are  then  calculated  and  the  assignment 
rule  applied  to  the  third  group  of  10  proposals.  This  is  repeated  until  all  the  proposals 
have  been  assigned.    [Ref.  1:  pg.  10] 

Due  to  the  balancing  of  basket  totals  at  each  stage  of  the  basket  assignment 
process,  the  resulting  assignment  should  result  in  nearly  equal  basket  totals.    Should 
additional  balancing  be  required,  the  previously  mentioned  computer  program  can  be 
used  (via  a  swapping  algorithm)  to  bring  basket  totals  into  closer  agreement. 
2.  Estimating  Negotiated  Prices  for  Unsampled  Proposals 

After  the  baskets  are  formed,  one  is  selected  at  random  and  all  its  proposals 
are  audited  and  negotiated.  Using  the  results  of  the  sample  negotiation,  the  sample 
ratio  factor,  R.  is  computed  as  in  equation  2.1. 

R  =  Total  negotiated  price  of  sample  Total  bid  price  of  sample  (eqn  2.1) 

The  totai  proposal  value  of  the  population  is  then  multiplied  by  the  sample  ratio 
factor,  R,  to  determine  the  population  audit  result.  This  value  will  be  the  estimated 
true  value  of  the  population." 


2The  sample  ratio  factor  could  also  be  applied  individually  to  each  unsampled 
proposal,  the  values  summed  and  the  total  added  to  the  sum  of  the  negotia:ed  values 
of  sampled  proposals.   The  result  would  be  the  same. 
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III.  STRATIFIED  RANDOM  SAMPLING 

A.       THE  CASE  FOR  STRATIFICATION 

Stratified  random  sampling  is  similar  in  many  respects  to  the  technique  of 
unrestricted  random  sampling.3  The  major  difference  is  that  the  population  is  divided 
into  two  or  more  groups  (strata),  each  of  which  is  then  sampled  separately.  The  results 
can  then  be  combined  to  give  an  estimate  of  the  total  population  value. 

The  primary  objective  of  stratification  in  auditing  is  to  reduce  the  impact  of  the 
population  variance  on  the  sampling  plan.  Basically,  a  population  of  heterogeneous 
items  (a  population  with  large  variance)  is  broken  into  two  or  more  groups  or  strata  of 
a  more  homogeneous  nature  (groups  with  small  variances).  The  total  population 
variance  is  unaffected  by  this  process.  However,  it  should  be  intuitively  clear  that 
within  each  group  so  constructed,  the  strata  variance  will  be  smaller  than  the 
population  variance.    [Ref.  4:  pg.  149] 

To  illustrate,  suppose  a  population  consists  of  seven  items-five  have  a  value  of 
SI  each,  and  two  have  a  value  of  S3  each.  The  variance  of  this  population  is  close  to 
SI,  but  by  forming  two  strata  with  the  five  items  valued  at  SI  each  in  one  stratum  and 
the  remaining  two  items  of  value  S3  in  the  other  stratum,  the  variation  of  each  stratum 
is  0.  This  reduction  in  variance  by  the  formation  of  two  strata  has  important 
implications  for  the  amount  of  sampling  error  and  the  size  of  the  sample  required.  The 
relationship  can  be  summarized  as  follows:  Given  any  population  of  size  N,  the  lower 
the  variability,  the  smaller  the  sample  size  required  to  achieve  any  given  precision4  and 
reliability5  requirements.   [Ref.  5:  pg.  12]. 

While  the  above  example  is  very  simplistic  and  hypothetical  in  nature,  it  does 
illustrate  the  fact  that  by  taking  a  relatively  heterogeneous  population  and  dividing  it 
up  into  homogeneous  groups  the  variance  of  each  group  will  be  smaller  than  that  of 


^The  principle  involved  in  unrestricted  random  sampling  is  that  every  element  in 
the  population  should  have  an  equal  chance  of  being  included  in  the  sample.  Since 
"randomness"  is  difficult  to  achieve  without  some  kind  of  aid,  a  random  number  table 
or  a  computerized  random  number  generator  are  often  used  to  insure  random  selection. 

4The  range  within  which  the  true  answer  most  likely  falls. 

"The  likelihood  that  the  true  answer  will  fall  within  the  established  range.  It  is 
usually  expressed  as  a  percentage,  being  the  number  o[  times  out  of  one  hundred  that 
the  true  answer  would  be  contained  within  the  determined  margins. 
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the  original  population.  As  a  result,  the  sample  size  required  will  be  smaller  than  if 
unrestricted  random  samples  were  taken;  or  alternatively,  the  reliability  would  be 
higher  or  the  precision  limits  narrower.  Stratification  should  therefore  be  applied  to 
heterogeneous  populations  which  can  be  divided  into  fairly  uniform  strata  on  the  basis 
of  some  criteria  that  affects  the  variable  being  studied.  Under  these  circumstances, 
stratification  usually  achieves  greater  precision  for  a  given  cost.  On  the  other  hand, 
stratification  is  unnecessary  in  homogeneous  populations  where  there  are  no  discernible 
strata  that  will  affect  the  results. 

To  use  stratified  sampling,  three  general  rules  must  be  adhered  to  [Ref.  6:  pg.  96]: 

1.  Every  element  must  belong  to  one  and  only  one  stratum. 

2.  There  must  be  a  tangible,  specifiable  difference  that  defines  and  distinguishes 
the  strata. 

3.  The  exact  number  of  elements  in  each  stratum  must  be  known. 

B.       DESCRIPTION  OF  STRATIFIED  RANDOM  SAMPLING 

Once  the  decision  has  been  made  that  stratification  would  be  beneficial  in  the 
sampling  process,  there  are  several  steps  that  must  be  taken.  These  steps  will  be  briefly 
discussed  below. 

1.  Establish  the  Desired  Precision  and  Reliability 

Statistical  samples  are  evaluated  in  terms  of  "precision,"  which  is  expressed  as 
a  range  of  values,  plus  or  minus,  around  the  sample  result,  and  "reliability"  (or 
confidence),  which  is  expressed  as  the  proportion  of  such  ranges  from  all  possible 
similar  samples  of  the  same  size  that  would  include  the  actual  population  value. 
[Ref.  7:  pg.  4] 

Basically,  the  statistical  measures  of  precision  and  reliability  have  to  do  with 
how  accurate  and  reliable  the  sampler  wants  his  sample  results  to  be.  An  example  of 
the  application  o[  these  two  measures  is  helpful  in  understanding  the  concepts. 
Suppose  an  auditor  is  designing  a  statistical  test  based  on  a  desire  to  obtain  an 
estimate  of  an  audited  account  value  to  within  S  10,000.  The  S  10.000  amount  reflects 
the  auditor's  judjment  as  to  what  would  constitute  a  material  deviation  in  reported 
values.  In  other  words,  the  auditor  does  not  want  his  estimate  of  the  audited  account 
value  to  be  greater  than  S  10,000  (either  plus  or  minus)  away  from  the  true  audited 
account  value.  Reliability  is  a  closely  related  concept.  The  auditor's  goal  is  not  only 
to  obtain  an  estimate  within  the  materiality  limit  of  S10.000  but  also  to  be  reasonably 
sure  that  this  estimate  is  sound.    Because  only  a  sample  is  observed  judgmcntally  or 
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statistically  in  most  audit  situations,  certainty  is  impossible.  Generally  accepted 
auditing  standards  recognize  this  by  requiring  reasonable  assurance  rather  than 
certaintv.  Reliability  is  the  statistical  measure  of  that  level  of  assurance  stated  as  a 
proportion.  For  example,  a  proportion  of  0.95  indicates  that  the  auditor  wishes  to 
achieve  a  95%  level  of  reliability  that  the  reported  amount  is  not  materially  different 
(plus  or  minus  S  10.000)  from  the  audited  amount. 

Specification  of  a  probable  range  for  a  population  parameter--a  plus  or  minus 
for  error-is  crucial  in  indicating  the  reliability  of  estimates.  This  process  involves  the 
construction  of  a  confidence  interval  for  the  population  parameter  being  estimated.  An 
in-depth  look  at  confidence  level  construction  is  beyond  the  scope  of  this  study 
however,  and  it  is  suggested  that  the  reader  consult  any  good  statistics  textbook  for  a 
detailed  discussion  of  this  topic.  For  this  study,  no  specific  precision  and  reliability 
levels  will  be  set:  the  purpose  of  this  study  being  to  compare  the  results  of  the  two 
sampling  methods  to  each  other  rather  than  to  attain  some  specific  level  of  accuracy 
and  reliability. 

2.  Designate  the  Strata  and  Strata  Boundaries 

For  all  practical  purposes,  there  is  currently  no  existing  way  to  select  the 
optimal  number  of  strata  or  the  strata  boundaries  [Ref.  4:  pg.  158].  Useful  rules  do 
exist,  however.  Ideally,  the  auditor  prefers  to  base  stratification  decisions  on  the 
specific  variable  of  interest.  In  most  audit  applications,  the  variable  of  interest  is  the 
number  of  audited  account  values.  There  is  a  problem  here,  however,  in  that  the 
number  of  audited  account  values  is  not  actually  known  until  after  sampling. 
Fortunately,  a  good  substitute  for  audited  account  values-reported  account 
values(book  values  )--is  usually  available.  The  auditor  generally  expects  a  reasonably 
high  correlation  between  the  available  reported  account  values  and  the  obtained  audit 
account  values,  and  can  be  reasonably  confident  about  basing  stratification  decisions 
on  the  available  unaudited  reported  account  values.  However,  this  does  limit  the 
benefits  of  stratification  in  that,  all  other  factors  being  equal,  unless  the  correlation 
between  reported  and  audited  account  values  is  perfect,  the  errors  introduced  by  a 
particular  audited  account  value  belonging  to  different  strata  than  the  related  reported 
account  values  will  eventually  negate  any  further  benefits  that  can  be  obtained  by  the 
addition  of  new  strata. 

It  is  probably  somewhat  clear  by  now  that,  from  a  practical  viewpoint,  the 
identification  of  strata  is  a  heuristic  process  (a  sort  of  educated  guess).    In  an  auditing 
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context,  the  approach  that  is  most  likely  to  be  beneficial  is  to  obtain  some  idea  of  the 
underlying  character  and  distributional  properties  of  the  population  of  reported 
account  values  (bock  values).  This  can  be  done  manually,  but  the  results  are  much 
more  meaningful  when  a  computer  can  be  utilized.  The  various  output  obtainable 
from  a  computer,  along  with  a  basic  understanding  of  the  data,  may  enable  the  auditor 
to  subjectively  select  strata  of  a  reasonable  nature.  In  some  cases,  the  data  may  lend 
themselves  to  obvious  strata  divisions,  but  in  most  situations  this  will  probably  not  be 
the  case. 

Even  if  there  are  a  certain  number  of  obvious  strata,  say  two,  there  are  further 
questions  to  be  asked.  For  example,  if  the  use  of  two  strata  contribute  to  a  substantial 
decline  in  the  population  variance,  one  might  reasonably  ask,  "If  two  strata  gave  good 
results  in  reducing  variance,  wouldn't  the  use  of  four  strata  give  results  that  are  twice 
as  good"?  The  answer  is,  although  an  increase  to  four  strata  might  also  be  beneficial, 
it  would  probably  not  lead  to  as  large  a  reduction  in  the  variance  estimate.  In  fact, 
such  diminishing  returns  are  observed  as  the  number  of  strata  increases.  The  first 
doubling  of  strata— from  one  to  two-can  produce  variance  reductions  of  as  much  as 
60%  or  70%  [Ref  4:  pg.  159].  However,  a  second  and  third  doubling  tend  to  curtail 
the  incremental  reductions  to  about  25%  [Ref.  4:  pg.  159].  Therefore,  there  is  some 
point  at  which  the  addition  of  more  strata  will  no  longer  be  useful  in  reducing  variance 
estimates,  and  may  in  fact  increase  variance.  The  only  practical  way  to  establish  the 
limits  of  strata  benefit  is  by  computer  simulation.  As  a  general  rule.  5  to  10  strata 
usually  account  ^depending  on  the  particular  population,  of  course)  for  most  of  the 
available  variance  reduction. 

Given  the  number  of  strata,  the  auditor  must  then  determine  how  and  where 
to  set  strata  boundaries.  Ideally,  strata  boundaries  should  be  established  on  the  basis 
of  audited  account  values,  as  before.  But  when  these  amounts  are  not  available, 
reported  (book)  account  values  are  commomly  used  as  the  basis  for  setting  most 
boundary  values.  This  substitution  will  work  well  if  reported  account  values  and 
audited  account  values  are  closely  correlated. 

Strata  boundaries  might  be  set  using  the  equal  dollar  value  per  strata  rule, 
which,  as  the  name  implies,  means  arranging  the  strata  boundaries  such  that  each 
strata  has  approximately  the  same  dollar  value;  or,  boundaries  might  be  established 
based  on  the  equal  variance  rule  where  each  strata  has  approximately  the  same  variance 
measure. 
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Another  rule,  sometimes  referred  to  as  the  Q-SUM  or  CUSUM  rule, 
establishes  the  strata  boundaries  by  first  creating  a  frequency  distribution  of  the 
recorded  (book.)  account  values.  The  square  root  of  the  frequency  of  recorded  account 
values  in  each  category  is  then  computed  and  summed  and  the  resulting  total  is  divided 
by  the  desired  number  of  strata.  The  auditor  attempts  to  create  strata  by  accumulating 
the  squared  frequency  measures  in  sequence  until  the  cumulated  sum  (CUSUM)  is 
approximately  equal  to  the  total  accumulation  divided  by  the  number  of  strata.  The 
next  strata  is  then  composed  of  the  next  grouping  in  the  sequence  such  that  the 
CUSUM  is  approximately  equal  to  twice  the  total  accumulation  divided  by  the  number 
of  strata.   [Ref.  4:  pg.  161] 

For  this  study,  the  population  will  be  divided  into  10  strata  based  on  the  book 
value  amount  of  the  audit  unit.  Stratification  by  book  amount  is  helpful  when  the 
book  amounts  of  the  audit  units  are  related  to  their  audit  values  [Ref.  3:  pg.  77].  The 
choice  of  10  strata  was  made  to  facilitate  comparison  with  the  Basket  Method  in  that 
10  "baskets"  will  be  used  when  applying  the  Basket  Method  in  this  study. 

Strata  boundaries  will  be  set  using  the  equal  dollar  value  per  strata  rule. 
Again,  this  rule  is  used  to  facilitate  comparison  with  the  Basket  Method  where  basket 
totals  are  nearly  equal  due  to  the  unique  basket  assignment  process.  It  may  seem  that 
if  the  "equal  dollar  value  per  strata"  rule  is  used  that,  conceptually,  there  is  no 
difference  between  the  "strata"  formed  under  Stratified  Random  Sampling  and  the 
"baskets"  formed  using  the  Basket  Method.  There  is  in  fact  a  significant  difference  that 
stems  from  the  distinctive  ways  in  which  the  strata  and  baskets  are  formulated.  Under 
the  Basket  Method,  the  population  is  partitioned  into  baskets  in  such  a  way  that  each 
basket  will  have  approximately  equal  dollar  value  and  contain  approximately  the  same 
number  of  individual  elements.  Under  the  Stratified  Random  Sampling  Method,  strata 
are  also  partitioned  so  that  they  contain  approximately  equal  dollar  value  but  the 
number  of  individual  elements  in  each  strata  may  vary  drastically. 
3.  Sample  Size  Determination  and  Allocation 

Two  mchods  are  generally  used  to  allocate  a  total  sample  to  individual  strata 
[Ref.  6:  pg.  97].  One  method  is  known  as  proportional  allocation.  In  this  method,  the 
percentage  of  the  sample  allocated  to  each  stratum  is  the  same  as  the  percentage  of  the 
total  population  accounted  for  by  that  stratum.   That  is. 


n.  =  n  x  N./N  (eqn  3.1 
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where  n.  represents  the  sample  size  for  the  ith  stratum,  n  the  total  sample  size,  N.  the 
number  of  population  items  in  the  ith  stratum,  and  N  the  total  population  size. 

A  generally  more  effective  method,  however,  is  optimal  allocation.  Optimal 
allocation  allocates  the  total  sample  to  the  individual  stratum  on  the  basis  of  the 
"relative''  stratum  size,  N,  and  the  stratum  standard  deviation,  SD. 

n.  =  n  x  x.SD.  EN.SD.  (eqn  3.2) 

11111  ~         * 

In  equation  3.2.  SD.  represents  the  standard  deviation  of  stratum  i.  All  other  variables 
are  the  same  as  in  equation  3.1. 

Although  the  optimal  allocation  method  is  generally  more  effective,  the 
proportional  allocation  method  will  be  utilized  in  this  study.  Proportional  allocation 
will  give  more  meaningful  results  (for  comparison  with  the  Basket  Method)  for  this 
investigation  given  that  strata  boundaries  are  being  set  using  the  "equal  dollar  value 
per  strata  rule."  If  optimal  allocation  were  used,  in  two  strata  the  sample  sizes 
calculated  using  equation  3.2  would  be  greater  than  the  total  number  of  elements  in  the 
strata.  If  this  were  to  happen  in  a  real  world  sampling  situation,  each  affected  strata 
sample  size  would  be  set  equal  to  its  population  size  and  sample  sizes  would  be 
recalculated  for  the  remaining  strata.  The  saturated  strata  would  then  be  audited  100 
percent.  To  do  this  for  this  study  would  not  facilitate  comparison  of  the  two  methods 
of  sampling  under  "like"  circumstances. 

The  sample  size  computations  depend  on  whether  the  optimal  or  proportional 
allocation  method  is  used.  There  are  equations  to  be  used  for  each  method  in 
calculating  the  appropriate  sample  size  required  to  achieve  a  stated  level  of  precision 
and  reliability.  The  equations  will  not  be  enumerated  here  because  sample  size 
requirement  calculations  are  not  required  to  be  made  for  the  purposes  of  this  study. 
This  is  because  in  actual  applications  where  the  true  value  of  the  population  is  not 
known  the  only  way  to  be  reasonably  certain  that  one's  results  are  valid  is  by 
complying  with  rules  which  will  tie  the  audit  to  statistical  theory.  The  sampling  rules 
for  Stratified  Random  Sampling  are  designed  to  do  just  that,  so  that  the  auditor  who 
follows  the  sampling  procedures  will  be  able  to  determine  the  extent  of  the  audit 
required  to  achieve  the  desired  level  of  certainty. 

In  this  study  the  true  values  of  the  proposals  are  known,  as  are  the  size  and 
distribution   of  errors,   and   the   Stratified   Random   Sampling   method  is   not   being 
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compared  to  its  theoretical  limits,  but  to  a  second  method  to  determine  which  of  the 
two  is  the  more  desirable  in  a  certain  case.  The  sample  size  is  this  study  will  be  chosen 
arbitrarily,  and  is  further  described  in  the  Description  of  Simulation  section. 

4.  Select  a  Random  Sample  of  Size  n    From  the  Strata 

5.  Calculate  the  Mean  of  Each  Stratum  Based  On   n  .  for  each  Stratum 

6.  Calculate  the  Estimated  Audited  Population  Total 

This  calculation  involves  taking  the  mean  of  each  stratum  (derived  in  step  5 
above),  multiplying  it  by  the  total  number  of  items  in  the  stratum  and  then  summing 
the  results.  This  gives  the  estimated  audited  population  total  which  can  be 
mathematically  represented  as  follows: 


v 


x.N.  (eqn  3.3) 


i    i 


where   x.  represents  the  mean  of  stratum  i,  N.,  the  total  number  of  items  within  stratum 
i.  and  ~  x.N.  the  sum  of  (  x.N'.). 

7.  Check  Reliability  of  the  Estimated  Audited  Population  Total 

This  step  involves  concluding  that  one  is  certain  at  the  reliability  specified  in 
Step  1  that  the  true  book  value  is  within  the  estimated  audited  population  total  nlus-or- 
minus  the  achieved  precision.6 


6There  is  a  formula  which  can  be  used  to  calculate  the  achieved  precision.  The 
achieved  precision  should  always  be  less  than  or  equal  to  the  desired  or  acceptable 
precision.  If  the  achieved  precision  is  greater  than  the  acceptable  precision,  the  sample 
size  is  unsulTicient  because  the  precision  limit  is  too  wide.  If  this  were  the  case  the 
sample  size  would  have  to  be  increased. 
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IV.  DESCRIPTION  OF  SIMULATION 

A.  DERIVING  COMPARABLE  RESULTS 

As  mentioned  previously  in  this  paper,  the  desired  sampling  technique  is  not 
necessarily  the  one  that  results  in  the  most  accurate,  average  estimate  for  proposal 
populations  with  varying  error  arrangements.  A  more  important  characteristic  of  the 
desired  method  will  be  that  it  responds  least  to  variations  in  the  arrangement  of  errors. 
In  other  words,  it  will  be  the  method  which  is  more  consistent  in  its  predictions  over 
various  error  arrangements  and  patterns.  Since  it  is  the  consistency  of  the  drawn 
sample  which  is  of  interest  in  this  investigation,  the  sample  selection  process  of  both 
the  Basket  Method  and  the  Stratified  Random  Sampling  Method  will  be  used  to  draw- 
samples  from  the  same  populations.  The  samples  will  then  be  evaluated  according  to 
the  basket  method,  which  will  give  an  estimate  of  the  true  value  of  the  population,  to 
see  how  well  each  method's  sample  reflected  the  value  of  the  population. 

The  rules  of  the  basket  method  will  create  the  identical  set  of  baskets  from  a 
given  population  every  time  they  are  applied.  Therefore,  the  "baskets"  of  the  basket 
method  can  easily  be  evaluated  by  a  complete  review  of  the  specific  results.  The  rules 
of  the  Stratified  Random  Sampling  technique  also  provide  a  finite  number  of  samples, 
but  that  number  is  significantly  greater  than  the  number  of  different  baskets. 

To  evaluate  the  sample  drawn  with  the  Stratified  Random  Sampling  rules,  the 
sample  is  treated  as  if  it  were  a  basket.  Then,  the  sample  is  evaluated  using  the  basket 
method  evaluation  technique;  that  is,  all  of  the  sample's  resident  proposals  are  audited 
and  their  true  value  is  divided  by  their  proposal  value.  The  resulting  factor  is 
multiplied  against  the  population  proposal  total  to  determine  the  best  estimate  of  the 
true  total  value  of  the  proposal  population. 

B.  CREATION  OF  THE  TEST  POPULATIONS 

Using  the  general  purpose  statistical  computing  system  Minitab,  the  original 
population  was  seeded  with  errors  at  a  5%  and  10%  rate  of  occurrence  in  a  random 
distribution.  The  5%  error  population  (population  A  )  was  then  skewed  to  form  two 
additional  test  populations.  One  (population  B)  had  its  errors  skewed  strongly  to  its 
higher  valued  proposals,  and  the  other  (population  C)  to  its  lower  valued  proposals. 
The   total   dollar   amount    of  error   and   number   of  overstated   proposals   remained 
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constant  during  the  skewing  procedure.  All  populations  were  created  in  both  a 
"dishonest"  version  which  had  overstatements  only  and  are  named  with  single  letters 
(populations  A.B,C,D,  and  E),  and  in  an  "honest"  version  with  both  overstatements 
and  understatements  named  by  double  letters  (AA,BB.CC.DD.  and  EE).  Except  for 
the  sign  on  each  error,  the  single  letter  named  populations  are  identical  to  their  double 
letter  named  counterparts.  Therefore,  populations  AA,  BB,  and  CC  differ  from  their 
single  lettered  counterparts  only  in  the  fact  that  they  contain  both  errors  of 
overstatement  and  understatement.  The  populations  with  10%  errors  differ  in  that  E 
and  EE,  while  containing  the  same  number  of  errors  in  the  same  distribution  and  sign 
as  D  and  DD  respectively,  have  errors  of  much  larger  magnitude,  so  that  the  sum  o[ 
the  dollar  value  of  the  errors  make  up  10%  of  the  population  in  E  and  EE  but  only 
1%  of  the  population  in  D  and  DD.    The  populations  are  described  in  Table  1. 

TABLE  1 
POPULATION  DESCRIPTION 


NAME 

A 

B 

C 

D 

E 

Population 

8,300 

8.300 

8,300 

8,300 

8,300 

Erroneous  Proposals 

5% 

5% 

5% 

10% 

10% 

L  S  Errors/2  S  Proposals 

9% 

9% 

9% 

1% 

10% 

Types  of  Errors 

+ 

+ 

4- 

+ 

4- 

Skew  (none,  high,  or  low) 

N 

H 

L 

N 

N 

NAME 

AA 

BB 

CC 

DD 

EE 

Population 

8,300 

8.300 

8,300 

8,300 

8.300 

Erroneous  Proposals 

5% 

5% 

5% 

10% 

10% 

£  S  Errors,  L  S  Proposals 

9% 

9% 

9% 

1% 

10% 

Types  of  Errors 

+  .'- 

+  /- 

+  /• 

+  /- 

+  .,<- 

Skew  (none,  high,  or  low) 

N 

H 

L 

N 

N 

C.       SIMULATION  EXECUTION 

Simulations  were  run  on  all  populations  using  both  a  Basket  Method  evaluating 
program  and  a  Stratified  Random  Sampling  procedure  which  was  done  manually 
within  Mmitab.   The  Basket  Method  program  utilized  was  written  by  Lieutenant  James 
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P.  Tortorelii  for  use  in  his  thesis  at  the  Naval  Postgraduate  School  [Ref.  2:  pg.  13]. 
This  program,  written  in  Waterloo  BASIC,  is  listed  in  Appendix  A.  The  Stratified 
Random  Sampling  simulation  process  is  detailed  in  Appendix  B  and  consists  of  the 
following  basic  steps.  Each  step  is  referenced  by  line  number  to  its  actual  application 
in  Appendix  B: 

1.  Stratify  the  Population 
(Appendix  B  line  numbers  6  -  45) 

2.  Allocate  the  Total  Sample  to  the  Strata 
(Line  numbers  4S  -  80) 

3.  Select  a  Random  Sample  From  Each  Strata 
(Line  numbers  SI  -  130) 

4.  Calculate  the  "Book  Value"  sum  for  Audited  Items 
(Line  numbers  131  -  133) 

5.  Calculate  the  "Audit  Value"  sum  for  Audited  Items 
(Line  numbers  134  -  136) 

6.  Calculate  the  Correction  Factor 
(Line  numbers  137  -  139) 

7.  Calculate  the  Predicted  Population  Audit  Total 
(Line  numbers  140  -  142) 

8.  Calculate  the  Percent  Error 
(Line  numbers  143  -  148) 

As  mentioned  earlier,  ten  baskets  were  arbitrarily  chosen  for  the  Basket 
Method;  this  resulted  in  830  proposals  per  basket.  Ten  strata  were  then  chosen  for  the 
Stratified  Random  Sampling  Method  with  a  total  sample  size  of  830  to  be  selected. 
Strata  boundaries  were  set  using  the  "equal  dollar  value  per  strata"  rule;  therefore,  each 
strata  has  approximately  the  same  total  dollar  amount  contained  within  it.  This  rule 
was  used  because  it  allows  better  comparison  with  the  Basket  Method  in  that  each 
"basket"  formed  under  the  Basket  Method  sample  selection  process  has  nearly  equal 
dollar  basket  totals.  Ten  trials  were  run  using  the  Stratified  Random  Sampling 
selection  method. 

The  ten  audit  results  for  each  sample  selected  by  the  two  methods  were  then 
divided  by  their  respective  proposal  sums  to  derive  the  correction  factors  as  follows. 

F  =  Total  audit  sum  of  sample,  Total  proposal  sum  of  sample  (eqn4.1) 
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These  correction  factors  were  multiplied  by  the  sum  of  all  proposals  to  derive  the 
predicted  true  audit  total  for  the  population.   That  is, 

PTAT  =  F  x   PSUM  (eqn  4.2) 

where  PTAT  represents  the  predicted  true  audit  total  for  the  population,  F.  the 
correction  factor,  and  PSUM  the  sum  of  all  proposals.  The  difference  between  the 
predicted  true  audit  total  and  the  actual  audit  total  was  then  divided  by  the  sum  of  all 
proposals  to  give  a  percent  error  for  each  basket  and  trial.  This  calculation  can  be 
mathematically  represented  as  follows: 

PE  =  (PTAT  -  AAT)  PSUM  (eqn  4.3) 

where  PE  represents  percent  error,  PTAT,  the  predicted  true  audit  total  for  the 
population.  AAT,  the  actual  audit  total  for  the  population,  and  PSUM  the  sum  of  all 
proposals.  The  mean  percent  error  for  each  method  was  then  calculated  in  the 
following  manner: 

MPE  =  IPE/10  (eqn  4.4) 

where  MPE  represents  the  mean  percent  error  and  LPE  the  sum  of  the  individual 
percent  error  amounts  for  each  trial.  The  mean  percent  errors  for  each  method  by 
population  are  listed  in  Table  2. 

TABLE  2 
SIMULATION  RESULTS  (MEAN  %  ERROR) 


NAME 

A 

B 

C 

D 

E 

3asket  Method 

.770 

.571 

.725 

.089 

.629 

SRS 

-.898 

1.011 

.783 

-.078 

-.659 

NAME 

AA 

BB 

CC 

DD 

EE 

Basket  Method 

.901 

.853 

1.099 

.065 

.774 

SRS 

-.936 

-1.253 

1.367 

-.105 

.971 
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Detailed  results  are  shown  in  Appendix  C.  A  positive  percent  error  represents 
an  overestimate  and  a  negative  percent  error  represents  an  underestimate.  With  the 
exception  of  population  D,  the  Basket  Method  of  sample  selection  was  always  more 
accurate  with  overstatement  errors.  For  the  populations  with  both  overstatement  and 
understatement  errors,  the  Basket  Method  was  more  accurate  across  the  board.  The 
data  from  Table  2  are  perhaps  more  vividly  illustrated  when  expressed  in  a  different 
manner.  The  Basket  Method  errors  are  expressed  as  a  percent  of  the  Stratified 
Random  Sampling  errors  in  Table  3. 

TABLE  3 

BASKET  METHOD  ERROR  AS  A  % 
OF  STRATIFIED  RANDOM  SAMPLING  ERROR 


NAME 

A 

B 

C 

D 

E 

Percent 

85 

56 

92 

114 

95 

NAME 

AA 

BB 

CC 

DD 

EE 

Percent 

96 

68 

80 

61 

79 

The  average  or  mean  value  (in  this  case  mean  percent  error)  in  a  set  of 
measurements  is  only  one  important  summary  figure.  It  is  also  important  to 
summarize  the  extent  to  which  values  differ  among  themselves  or  about  a  central  value. 
One  of  the  most  useful  statistical  measures  of  variability  is  the  standard  deviation. 
This  measure  is  based  on  the  concept  of  deviations  from  the  mean.   The  deviation  of  a 

sample   measurement  y.   from    its   mean    y   is   defined  as  (v.-  y)  .      The     standard 

deviation  of  a  sample  of  "n"  measurements  y,,  y-, y    is  defined  to  be  the  square 

root  of  the  sum  of  the  squared  deviations  divided  by  (n  -  1).  The  standard  deviation,  s. 
can  be  denoted  as  follows. 

s  =  V'Ky.  -  y)2/n  -  1  I'eqn  4.5) 

As  previously  mentioned,  the  measure  of  standard  deviation  may  be  used  to  show  the 
degree  of  variation  among  values  in  a  given  set  of  data,  or  it  may  be  used  to 
supplement  an  average  to  describe  a  group  of  data.  It  also  may  be  used  to  compare 
one  group  of  data  with  another.    When  the  standard  deviation  is  high,  the  average 


(mean)  is  of  less  significance  as  a  statistical  measure.    When  the  standard  deviation  is 
low.  the  value  of  the  average  is  considered  to  be  a  highly  representative  value. 

The  standard  deviations  of  the  percent  error  for  each  method  were  calculated 
using  the  data  from  Appendix  C.    The  results  are  given  in  Table  4. 

TABLE  4 
STANDARD  DEVIATION  OF  MEAN  PERCENT  ERROR 


NAME 

A 

B 

C 

D 

E 

Basket  Method 

.725 

.468 

.473 

.083 

.462 

SRS 

.698 

.584 

.381 

.059 

.470 

NAME 

AA 

BB 

CC 

DD 

EE 

Basket  Method 

.530 

.796 

.800 

.055 

.601 

SRS 

.915 

1.028 

1.332 

.062 

.586 

In  looking  at  the  results  in  Table  4,  the  significance  of  the  standard  deviation  figures 
lies  not  so  much  in  whether  they  are  considered  to  be  high  or  low;  the  significance  lies 
in  the  comparable  sizes  of  the  standard  deviations  between  the  Basket  Method  and  the 
Stratified  Random  Sampling  Method.  What  this  means  is  that  the  mean  percent  errors 
for  both  methods  have  about  the  same  "representativeness"  as  far  as  being  a  good 
summary  statistic.  This  lends  more  credibility  to  the  simulation  results  as  a  basis  for 
comparison  of  the  two  methods. 
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V.  SUMMARY  AND  CONCLUSIONS 

A.  RESISTANCE  TO  PROPOSAL  RIGGING 

In  order  to  benefit  from  the  potential  time  and  labor  savings  a  sampling  system 
offers,  the  sampling  technique  must  be  resistant  to  padding  schemes.  If  not,  a 
dishonest  contractor  has  much  to  gain  by  trying  to  selectively  pad  proposals. 
Therefore,  as  mentioned  previously,  the  primary  goal  is  not  necessarily  to  determine 
which  of  the  two  methods  is  the  most  accurate,  but  to  see  which  one  least  benefits 
attempted  padding  schemes.  When  comparing  a  method's  performance  between  the 
single  and  double  letter  versions  of  a  population  it  can  be  seen  in  Table  2  that  both  the 
Basket  Method  (with  the  exception  of  population  D)  and  the  Stratified  Random 
Sampling  method  are  stricter  when  estimating  the  value  of  the  overstatement-only 
population  than  when  estimating  the  value  of  the  "honest"  populations.  Therefore, 
padding  one's  contract  proposals  with  overstatements  in  random,  low,  or  high  skewed 
distributions  prior  to  submitting  them  to  either  method  for  evaluation  is  not  likely  to 
raise  the  resulting  estimate  for  the  population,  but  is  instead  likely  to  lower  the 
estimated  value.  However,  except  for  population  D,  the  samples  drawn  with  Stratified 
Random  Sampling  allowed  the  overstatement  only  (padded)  populations  a  larger 
estimate  than  die  the  sample  drawn  with  the  Basket  Method. 

B.  EVALUATION 

Assuming  honest  contractors  are  as  likely  to  understate  as  overstate  their  costs 
and  dishonest  contractors  are  not,  honest  contractors  will  be  more  successful  (except 
for  populations  D  and  DD  under  the  Basket  Method)  than  dishonest  contractors  under 
either  of  the  sampling  methods.  Since  the  Basket  Method  allows  less  benefit  to  accrue 
to  the  dishonest  contractor  than  Stratified  Random  Sampling,  and  because  it  gives  a 
more  accurate  estimate  in  general,  the  Basket  Method  is  judged  to  be  a  more  desirable 
sampling  method  for  the  purposes  addressed  in  this  paper. 
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C.       AREAS  FOR  FURTHER  RESEARCH 

Some  suggestions  for  further  study  are: 

1.  Fewer  or  more  baskets  and  strata  may  be  used. 

2.  The  Basket  Method  can  be  compared  to  other  sampling  methods. 

3.  A  data  set  with  much  smaller  variance  in  proposal  size  could  be  used. 

4.  Additional  error  arrangement  strategies  can  be  developed  and  tested. 
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APPENDIX  A 
BASKET  METHOD  PROGRAM  LISTING 

00100  REM  THIS  IS  A  PROGRAM  TO  PROCESS  DATA  USING  THE  BASKET  METHO 

00120  REM  DATA  IS  INPUT  FROM  A  FILE.  SEPARATED  BY  COMMAS.  AND  LISTE 

00140  REM  AS  PAIRS  OF  VALUES  FOR  A  BID.  THE  BOOK  FIRST  AND  THE 

00160  REM  AUDITED  VALUE  SECOND.   THE  PROGRAM  EXPECTS   DATPOP'  PAIR 

00 ISO  REM  OF  VALUES.    DATA  MUST  BE  IN  DESCENDING  ORDER  BY  BOOK  VALl 

00200  REM 

00220  REM    **   DIMENSION  VARIABLES    ** 

002-10  REM 

00260  DIM  ASUM(50),  BSUM(50),  ANEXT(50),  BNEXT(50) 

00280  DIM  ERRORP(50),  FACTOR(50),  ERRORA(50) 

00300  REM 

00320  REM    **    SET  CONSTANTS    ** 

00340  REM 

00360  B  =   10  !  NUMBER  OF  BASKETS 

003S0  DATPOP  =  8300        !  NUMBER  OF  DATA  PAIRS 

00400  BPOP=  INT(DATPOP  B)  !  INITIATE  RUNNING  TALLY  OF  DATA  PAIRS  READ 

00420  OPEN  #3,  TEST  (RECFM  F  LRECL  80)',  INPUT 

00440  ATOT  =  0 

00460  BTOT  =  0 

004S0  BPOP1  =   1 

00500  FOR  J  =   1  TO  10 

00520     ASUM(J)  =  0 

00540     BSUM(J)  =  0 

00560    NEXT  J 

005S0  EES  =  0  !   SUM  OF  ERROR  SQUARES 

00600  EED  =  0  !    SUM  OF  BASKET  DOLLAR  SQUARES 

01000    REM 

01020    REM  **    ROUTINE  TO  READ  IN  DATA   ** 

01040    REM 

01060   IF  BPOP1  >  BPOP 

01080       GOTO  4000        !  IF  NO  MORE  DATA,  THEN  PROCESS 

01100       END  IF 

01120   FOR  I  =   1  TO  B 

01140       INPUT  #3,  BNEXT(I),  ANEXT(I) 

01160       NEXT  I 

01  ISO   BPOP1  =  BPOP1  +  1 

02000  REM 

02020  REM    **    ROUTINE  TO  SORT  PARTIAL  SUMS  IN    ** 

02040  REM    **      BASKETS  IN  ASCENDING  ORDER       ** 

02060  REM 

02080    I  =   1 
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02100  WHILE  I  <  B 

02120       IFBSL'M(I)  >  BSUM(I  +  1) 

02140  CI  =  BSL'M(I) 

02160  C2  =  ASL'M(I) 

02130  BSL'M(I)  =  BSUM(I+1) 

02200  ASUM(I)  =  ASUM(I+1) 

02220  BSUM(I+1)  =  CI 

02240  ASL*M(I  +  1)  =  C2 

02260  IF  I  >   1 

022S0  1=1-1 

02300  END  IF 

02320  GOTO  2120 

02340  END  IF 

02360       1=1+1 

02380       ENDLOOP 

03000   REM 

03020   REM    **   ADD  NEXT  ROUND  TO  BASKETS   ** 

03040   REM 

03060    FOR  I  =   1  TO  B 

030S0       BSUM(I)  =  BSL'M(I)  +  BNEXT(I) 

03100       ASL'M(I)  =  ASUM(I)  +  ANEXT(I) 

03120       NEXT  1 

03140   GOTO  1060 

04000   REM 

04020   REM    **   ADDING  ROUTINE    ** 

04040   REM 

04060    FOR  I  =   1  TO  B 

040S0       BTOT  =  BTOT  +  BSUM(I) 

04100       ATOT  =  ATOT  4-  ASUM(I) 

04120       NEXT  I 

04140   FOR  I  =   1  TO  B 

04160       FACTOR(I)  =  ASUM(I);BSUM(I) 

041  SO       ERRORA(I)  =  BTOT  *  FACTO R( I)  -  ATOT 

04185       EES  =  EES  +  ERRORA(I)  *  ERRORA(I) 

04190       EED  =  EED  +  BSUM(I)  *  BSUM(I) 

04200       ERRORP(I)  =  100  *  ERRORA(I)/BTOT 

04220       MAE  =  MAE  +  ABS(ERRORA(I)) 

04240       MPE  =  MPE  +  ABS(ERRORP(I)) 

04260       NEXT  I 

042S0   MAE  =  MAE  /  B 

04300   MPE  =  MPE  /  B 

05000   REM 

05020   REM    **   PRINT  RESULTS   ** 

05040   REM 

05060   PRINT  BASKET   BOOK  VALUE   AUDIT  VALUE   FACTOR   %ERROR   ERR  OF  F 

05080   FORM0S=  TOTAL   #######.##    ###&###.##    #.####  ##.####   £####.##' 

05100   FORMlS='   ##    #######.##    #######.##    #.####  #.####   #####•##' 
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05120  FORM2S  =  'MEAN  #•####  #####•##' 

05140   PRINT  USING  FORM0S,  BTOT,  ATOT,  ATOT  BTOT.  100*(BTOT-ATOT)  BTOT. 

05160   &        BTOT-  ATOT 

05180   FOR  I  =  1  TO  B 

05200       PRINT  USING  FORM  IS,  I,  BSUM(I).  ASUM(I),  FACTO R(  I).  ERRORP(I),& 

05220   &        ERRORA(I) 

05240       NEXT  I 

05260   PRINT  USING  FOR.M2S,  MPE,  MAE 

052SO    FORM3S-'    DOLLARS       #######.##       #######.##' 

05300    FORM4S=  CONTRACTS       #######.##       #######.##' 

06000   PRINT  '  MEAN  AUDITED  S.  D.' 

06020   PRINT  USING  FORM3S,BTOT  B,SQR(((B*EED)-(BTOT*BTOT))/(B*(B-l))) 

06040   PRINT  USING  FORM4S.DATPOP  B,0 

06060    REM 

06080   REM    **   CLEANUP   ** 

06100   REM 

06120   CLOSE  #3 

07000   END 
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APPENDIX  B 
MINITAB  SIMULATION  OF  STRATIFIED  RANDOM  SAMPLING 

MTB  >  name  c22  'book'  c50  'audit' 
MTB  >  sum  c22  kl 

SUM        =         409605 
MTB  >  sum  c50  k2 

SUM        -         372255 
MTB  >  copy  c22  c50  c51  c52; 
SUBC>  use 'book'  =  .50:18.75. 
MTB  >  count  c51  k3 

COUNT     =         332S.O 
MTB  >  copy  c22  c50  c53  c54; 
SUBC>  use 'book'  =   18.76:27.21. 
MTB  >  count  c53  k4 

COUNT     =         1807.0 
MTB  >  copy  c22  c50  c55  c56; 
SUBC>  use 'bock'  =  27.22:43.50. 
MTB  >  count  c55  k5 

COUNT     =         1228.0 
MTB  >  copy  c22  c50  c57  c58; 
SUBC>  use 'book'  =  43.51:79.00. 
MTB  >  count  c57  k6 

COUNT     =         698.00 
MTB  >  copy  c22  c50  c59  c60; 
SUBC>  use 'book'  =  79.01:107.35. 
MTB  >  count  c59  k7 

COUNT     =         429.00 
MTB  >  copy  c22  c50  c61  c62; 
SUBC>  use  book'  =   107.36:141.36. 
MTB  >  count  c61  kS 

COUNT     =         341.00 
MTB  >  copy  c22  c50  c63  c64; 
SUBC>  use 'book'  =   141.37:219.82. 
MTB  >  count  c63  k9 

COUNT     =         238.00 
MTB  >  copy  c22  c50  c65  c66; 
SUBC>  use 'book'  =  219.83:436.65. 
MTB  >  count  c65  klO 

COUNT     =         135.00 
MTB  >  copy  c22  c50  c67  c68; 
SUBC>  use 'book'  =  436.66:906.31. 
MTB  >  count  c67  kll 

COUNT     =         67.000 
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42)  MTB  >  copy  c22  c50  c69  c70: 
43)SUBO  use 'book'  =  906.32:2440.00. 

44)  MTB  >  count  c69  k.12 

45)  COUNT     =         29.000 

46)  MTB  >  let  kl3  =  830 
4")  MTB  >  let  kl4  =  8300 

48)  MTB  >  let  kl5  =  kl4  *  (k3  kl4) 

49)  MTB  >  round  kl5  kl5 

50)  ANSWER  =         332S.OOOO 

51)  MTB  >  let  kl5  =  kl3  *  (k3  k!4) 

52)  MTB  >  round  kl5  kl5 

53)  ANSWER  =  333.0000 

54)  MTB  >  let  kl6  =  kl3  *  (k4  kl4) 

55)  MTB  >  round  kl6  kl6 

56)  ANSWER  -  IS  1.0000 

57)  MTB  >  let  kl7  =  kl3  *  (k5  kl4) 

58)  MTB  >  round  kl 7  kl7 

59)  ANSWER  -  123.0000 

60)  MTB  >  let  klS  =  kl3  *  (k6  kl4) 

61)  MTB  >  round  kl8  klS 

62)  ANSWER  =  70.0000 

63)  MTB  >  let  kl9  =  kl3  *  (k7  kl4) 

64)  MTB  >  round  kl9  kl9 

65)  ANSWER  =  43.0000 

66)  MTB  >  let  k20  =  kl3  *  (kS;kl4) 

67)  MTB  >  round  k20  k20 

68)  ANSWER  =  34.0000 

69)  MTB  >  let  k21  =  kl3  *  (k9  kl4) 
-Q)  MTB  >  round  k21  k21 

71)       ANSWER  =  24.0000 

"2,  MTB  >  let  k.22  =  kl3  *  (kl0/kl4) 

73)  MTB  >  round  k22  k22 

74)  ANSWER  =  13.0000 

75)  MT3  >  let  k23  =  kl3  *  (kll  kl4) 

76)  MTB  >  round  k23  k23 

77)  ANSWER  =  7.0000 

7S)  MTB  >  let  k24  =  kl3  *  (kl2/kl4) 
"9)  MTB  >  round  k24  k24 

50)  ANSWER  =  3.0000 

51)  MTB  >  sample  kl5  c51  c52  c71  c72 
82)  MTB  >  sumc71  k25 

53)  SUM        -         4025.3 

54)  MTB  ~>  sum  c72  k26 

85)  SUM        =         4025.3 

86)  MTB  >  sample  kl6  c53  c54  c73  c74 

87)  MTB  >  sumc73  k27 

88)  SUM        =         4135.1 
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89)  MTB  >  sum  c74  k28 

90)  SUM       =         4135.1 

91)  MTB  >  sample  kl7  c55  c56  c75  c76 

92)  MTB  >  sum  c75  k29 

93)  SUM        =         4099.2 

94)  MTB  >  sum  c76  k30 

95)  SUM        =         4099.2 

96)  MTB  >  sample  kl8  c57  c58  c77  c78 

97)  MTB  >  sumc7T  k31 

98)  SUM        =         4081.9 

99)  MTB  >  sum  c78  k32 

100)  SUM        =         4081.9 

101)  MTB  >  sample  kl9  c59  c60  c79  c80 

102)  MTB  >  sumc79  k33 

103)  SUM        =         4127.1 

104)  MTB  >  sum  cSO  k34 

105)  SUM        =         2597.1 

106)  MTB  >  sample  k20  c61  c62  c81  c82 

107)  MTB  >  sumcSl  k35 

108)  SUM        =         404S.2 

109)  MTB  >  sum  c82  k36 

110)  SUM        =         2608.2 

111)  MTB  >  sample  k21  c63  c64  c83  cS4 

112)  MTB  >  sumcS3  k37 

113)  SUM        =         4264.7 

114)  MTB  >  sumc84  k38 

115)  SUM        =         3994.7 

116)  MTB  >  sample  k22  c65  c66  c85  cS6 

117)  MTB  >  sumc85  k39 

118)  SUM        =         3937.9 

119)  MTB  >  sumcS6  k40 

120)  SUM        =         3~57.9 

121)  MTB  >  sample  k23  c67  c6S  c87  c88 

122)  MTB  >  sumc87k41 

123)  SUM        =         4700.3 

124)  MTB  >  sumcSS  k42 

125)  SUM       =        4520.3 

126)  MTB  >  sample  k24  c69  c70  c89  c90 

127)  MTB  >  sumc89  k43 
12S)      SUM        =         3892.9 

129)  MTB  >  sumc90  k44 

130)  SUM       =         3892.9 

131)  MTB  >  let  k45  =  k25  +  k27  +  k29  +  k31  +k33  +  k35  +  k37  +  k39  +  k41  +  k43 

132)  MTB  >  prink45 

133)  K45        41312.6 

134)  MTB  >  letk46  =  k26  +  k28  +  k30  +  k32  +  k34  +  k36  +  k38  +  k40  +  k42  +  k44 

135)  MTB  >  prink46 


136)  K46        37712.6 

137)  MTB  >  let  kJ7  =  k46/k45 

138)  MTB  >  prin  k47 

139)  K47        0.912S59 

140)  MTB  >  let  k48  =  k47  *  kl 

141)  MTB  >  pnnk4S 

142)  K4S         373912 

143)  MTB  >  let  k^9  =  (k48  -  k2)  kl 

144)  MTB  >  prin  k49 

145)  K49        0.00404459 

146)  MTB  >  let  k50  =  k49  *  100 

147)  MTB  >  prin  k50 
14S)  K50        0.404459 
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APPENDIX  C 
DETAILED  RESULTS 


POPULATION  A 


Total  Book  Value:  $409,605.73 
:otal  Audit  Value:  $372,255.73 


Percent  Error  Us 

ing 

Percent  Error  Using 

Trial 

Basket  Method 

SRS 

1 

-1.211 

.404 

2 

2.307 

.609 

3 

.769 

-.057 

4 

.330 

.943 

5 

.330 

-2.429 

6 

-1.651 

.375 

7 

-.330 

-1.438 

8 

.110 

-1.397 

9 

-.330 

.609 

10 

-.330 

-.663 

Mean 

.770 

-.898 

POPULATION  AA 


Total  Bock  Value:  $409,605.73 
Total  Audit  Value:  $410,055.73 


Percent  Error  Using     Percent  Error  Using 


Trial 

Basket  Method 

1 

.110 

2 

-.769 

3 

1.648 

4 

-.989 

5 

1.648 

6 

-1.210 

7 

-1.210 

3 

-.330 

9 

.549 

10 

.549 

Mean 

.901 

SRS 

- 

.110 

- 

.325 

- 

.109 

-1 

.843 

-1 

.644 

.562 

-1 

.838 

2 

.501 

.106 

- 

.326 

- 

.936 

POPULATION  B 


Total  Book  Value:  $409,515.73 
Total  Audit  Value:  $372,255.73 


Percent  Error  Using     Percent  Error  Using 
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Trial      Basket  Method  SRS 

.540 

-1.295 

.447 

2.016 

-.540 

.103 

1.403 

1.106 

-1.440 

-1.217 

1.011 


1 

1.187 

2  . 

-1.231 

3 

.088 

4 

.303 

5 

1.187 

6 

.088 

7 

-.132 

3 

-.352 

9 

-.571 

10 

-.571 

Mean 

.571 

POPULATION  BB 


Total  Book  Value:  $409,515.73 
Total  Audit  Value:  $408,975,73 


Percent  Error  Using     Percent  Error  Using 


Trial 

Basket  Method 

SRS 

1 

1.890 

-.092 

2 

.352 

-3.139 

3 

-.527 

-1.682 

4 

.132 

-.085 

5 

-.747 

1.042 

6 

-.083 

.572 

7 

-2.068 

-1.415 

3 

-.038 

1.663 

9 

1.890 

2.492 

10 

-.747 

.346 

Mean 

.853 

-1.253 

POPULATION 

C 

Total  Book 

Value:  $409 

,605. 

73 

Total  Audit 

Value:  $372, 

255. 

73 

Percent  Error  I 

[sing 

Pe 

rcent  Error  Using 

Trial 

Basket  Method 

SRS 

1 

.769 

-.173 

2 

.549 

.565 

3 

-.110 

-.706 

4 

.110 

-1.056 

5 

.549 

1.182 

6 

-.549 

1.250 

7 

-.769 

-1.232 

8 

-.989 

.747 

9 

-1.211 

.367 

10 

1.648 

-.557 
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Mean        .725  .783 


POPULATION  CC 


Total  Book  Value:  $409,605.73 
Total  Audit  Value:  $409,155.73 


Percent  Error  Using 

Percent  Error  Using 

Trial 

Basket  Method 

SRS 

1 

-1.210 

.994 

2 

2.527 

4.715 

3 

-.769 

-1.679 

4 

-2.310 

-.323 

5 

-.110 

-.555 

6 

.110 

-.994 

7 

-.549 

-1.245 

8 

1.428 

.329 

9 

-.549 

2.297 

10 

1.428 

.536 

Mean 

1.099 

1.367 

POPULATION  D 

Total  3ook  Value:  $375,991.73 
Total  Audit  Value:  $372,256.73 

Percent  Error  Using     Percent  Error  Using 
hod  SRS 

-.201 
-.096 
-.031 
.037 
.117 
.127 
.039 
-.015 
-.033 
-.037 
-.078 

POPULATION  DD 

Total  Book  Value:  $375,991.73 
Total  Audit  Value:  $376,360.73 


Trial 

Basket  ^ 

1 

.108 

2 

.003 

■> 

-.024 

4 

-.096 

5 

-.048 

6 

-.132 

7 

.237 

8 

-.024 

9 

.048 

10 

-.119 

Mean 

.089 

Percent  Error  Us 

ing 

Percent  Error  Using 

Trial 

Basket  Method 

SRS 

1 

-.026 

.048 

2 

-.086 

-.170 

3 

.057 

-.013 

4 

.033 

-.133 

5 

-.134 

37 

-.015 

6 

-.002 

-.156 

7 

-.014 

-.170 

3 

.059 

-.145 

9 

-.062 

.076 

10 

.177 

.126 

Mean 

.065 

-.105 

POPULAT 

'ION 

E 

Total  Book  Value:  3413,756.73 
Total  Audit  Value:  $372,256.73 


Percent  Error  Using 

Percent  Error  Using 

Trial 

Easket  Me 

thod 

SRS 

1 

-.433 

.146 

2 

-.346 

.103 

3 

.967 

-.944 

4 

-.362 

-.404 

5 

-1.090 

.434 

6 

.725 

-.231 

7 

1.450 

-.601 

3 

-.121 

1.170 

9 

-.121 

-1.279 

10 

-.121 

-1.279 

Mean 

.629 

-.659 

POPULATION  EE 

tal  Book 

Value:  $413 

,756.73 

il  Audit 

Value:  $417, 

,856.73 

Percent  Error  Using 

Percent  Error  Using 

Trial 

Basket  Me 

thod 

SRS 

1 

.338 

1.267 

2 

-1.961 

-.375 

3 

.822 

1.001 

4 

-.024 

1.295 

5 

.943 

1.295 

6 

1.305 

-1.946 

7 

-1.111 

.105 

8 

-.024 

-1.471 

9 

-.749 

.459 

10 

.459 

.500 

Mean 

.774 

.971 
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